Reenacting Transactions to Compute their Provenance
نویسندگان
چکیده
Database provenance is essential for auditing, data debugging, understanding transformations, and many additional use cases. While these applications do benefit from state-ofthe-art provenance tracking for queries, most use cases also require provenance for transactional updates. We present the first provenance model for concurrent database transactions. Our model extends the well-known semiring provenance framework with version annotations and update operations. Based on this model, we present the first solution for computing the provenance of database transactions. Our approach can retroactively trace transaction provenance as long as an audit log and time travel functionality are available (both are supported by most DBMS) and without storing any additional information. For a given transaction, our approach constructs a reenactment query that simulates the effect of the transaction. This query is guaranteed to produce the updated versions of tables produced by the transaction and has the same provenance as the original transaction, i.e., it is annotation-equivalent. Using time travel and by adopting well-known techniques for computing the provenance of queries, we can use reenactment to retroactively compute the provenance of transactions. Currently, we support two widely applied concurrency control mechanisms: snapshot isolation and read committed snapshot isolation. We have implemented a prototype on-top of a commercial database system and our experiments confirm that 1) the runtime and storage overhead required to support time-travel and the audit log is tolerable and 2) by applying novel optimizations we can efficiently compute the provenance of large transactions over large data sets.
منابع مشابه
Formal Foundations of Reenactment and Transaction Provenance
Provenance is essential for auditing, data debugging, understanding transformations, and many additional use cases. All these use cases would benefit from provenance for transactional updates. We present a provenance model for snapshot isolation transactions extending the semiring framework with version annotations and updates. Based on this model, we present the first solution for computing th...
متن کاملComputational provenance in hydrologic science: a snow mapping example.
Computational provenance--a record of the antecedents and processing history of digital information--is key to properly documenting computer-based scientific research. To support investigations in hydrologic science, we produce the daily fractional snow-covered area from NASA's moderate-resolution imaging spectroradiometer (MODIS). From the MODIS reflectance data in seven wavelengths, we estima...
متن کاملA Generic Provenance Middleware for Database Queries, Updates, and Transactions
We present an architecture and prototype implementation for a generic provenance database middleware (GProM) that is based on the concept of query rewrites, which are applied to an algebraic graph representation of database operations. The system supports a wide range of provenance types and representations for queries, updates, transactions, and operations spanning multiple transactions. GProM...
متن کاملDebugging Transactions and Tracking their Provenance with Reenactment
Debugging transactions and understanding their execution are of immense importance for developing OLAP applications, to trace causes of errors in production systems, and to audit the operations of a database. However, debugging transactions is hard for several reasons: 1) after the execution of a transaction, its input is no longer available for debugging, 2) internal states of a transaction ar...
متن کاملSemantic Representation of Provenance in Wikipedia
Wikis are often considered as being a wide source of information. However, identifying provenance information about their content is crucial, whether it is for computing trust in public wiki pages or to identify experts in corporate wikis. In this paper, we address this issue by providing a lightweight ontology for provenance management in wikis, based on the W7 model. Furthermore, we showcase ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014